Count-Min-Log sketch: Approximately counting with approximate counters
نویسندگان
چکیده
Count-Min Sketch [1] is a widely adopted algorithm for approximate event counting in large scale processing. However, the original version of the Count-Min-Sketch (CMS) suffers of some deficiences, especially if one is interested in the low-frequency items, such as in textmining related tasks. Several variants of CMS [5] have been proposed to compensate for the high relative error for low-frequency events, but the proposed solutions tend to correct the errors instead of preventing them. In this paper, we propose the Count-Min-Log sketch, which uses logarithm-based, approximate counters [7, 4] instead of linear counters to improve the average relative error of CMS at constant memory footprint.
منابع مشابه
Count-Min Tree Sketch: Approximate counting for NLP
The Count-Min Sketch [1] is a widely adopted structure for approximate event counting in large scale processing. In a previous work [7] we improved the original version of the Count-Min-Sketch (CMS) with conservative update using approximate counters [6, 4] instead of linear counters. These structures are computationaly efficient and improve the average relative error (ARE) of a CMS at constant...
متن کاملApproximate Scalable Bounded Space Sketch for Large Data NLP
We exploit sketch techniques, especially the Count-Min sketch, a memory, and time efficient framework which approximates the frequency of a word pair in the corpus without explicitly storing the word pair itself. These methods use hashing to deal with massive amounts of streaming text. We apply CountMin sketch to approximate word pair counts and exhibit their effectiveness on three important NL...
متن کاملSketching Techniques for Large Scale NLP
In this paper, we address the challenges posed by large amounts of text data by exploiting the power of hashing in the context of streaming data. We explore sketch techniques, especially the CountMin Sketch, which approximates the frequency of a word pair in the corpus without explicitly storing the word pairs themselves. We use the idea of a conservative update with the Count-Min Sketch to red...
متن کاملA Coin Tossing Algorithm for Counting Large Numbers of Events
"Approximate counters" are realized by probabilistic algorithms that maintain an approximate count in the interval 1 to n using only about 10921092 n bits. The algorithmic principle was proposed by R . M o r r i s [7] : Starting with counter C = 1, after n increments C should contain a good approximation to log e n . Thus C should be increased by 1 after other n increments approximately. Since ...
متن کاملApproximate Counting: A Detailed Analysis
Approximate counting is an algorithm proposed by R. Morris which makes it possible to keep approximate counts of large numbers in small counters. The algorithm is useful for gathering statistics of a large number of events as well as for applications related to data compression (Todd et al.). We provide here a complete analysis of approximate counting which establishes good convergence properti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1502.04885 شماره
صفحات -
تاریخ انتشار 2015